Filtering

For the Flow Cytometry (FC) and Resistence (RES) data we will filter guides in 2 steps:

  1. Filter genes that have low replicate correlation and/or high variability for Log-Fold Change. If there are multiple conditions per gene, then pick the most well correlated.
  2. Filter guides that have high replicate variability for Log-Fold Change (3 standard deviations from the mean), or have no variability at all.

Human Flow Cytometry Data

Gene Filtering

Cell Gene Avg. Rep. Correlation Avg. Variance Avg.Avg.LFC Guides Avg.LFC Variance Dip.Test.P-Val Remove
MOLM-13 CD15 0.9662890 0.0240898 0.4856672 384 0.6922808 0.0000 FALSE
TF-1 CD13 0.9570524 0.0646212 0.5501278 480 1.4466709 0.0000 FALSE
MOLM-13 CD33 0.9429190 0.0586406 1.8360495 192 0.9571894 0.9565 FALSE
NB4 CD13 0.9586049 0.0566933 0.7731802 480 1.2946018 0.6685 TRUE
NB4 CD33 0.9098274 0.1258232 1.5708943 192 1.2518799 0.9355 TRUE
  • Guides targetting CD13 are more variable in TF-1 cells, so we will get better seperation between good and bad guides

  • Lower technical replicate correlation of CD44 in NB4 is lower

Note: Replicates for FC data are technical, not experimental

Guide Filtering

  • Filter out any guides with high (or no) variability across replicates
  • Guides are highly variability if they have a variance greater than 3 standard deviations from the mean variability

  • Removed 10 guides*


Mouse Flow Cytometry Data

Gene Filtering

Gene Avg.Rep.Correlation Avg.Variance Avg.Avg.LFC Guides Avg.LFC Variance Dip.Test.P-Val Remove
H2-K 0.8211892 1.2035568 2.1761582 235 5.6704374 0.0040 FALSE
CD45 0.8140521 1.2587999 1.1741811 584 5.6286802 0.0000 FALSE
Cd5 0.7660277 2.0192704 1.0007152 423 6.9193161 0.0005 FALSE
Cd43 0.7573723 1.5938866 0.5684481 637 5.2267523 0.0000 FALSE
Thy1 0.7461000 2.0896311 0.7131360 363 6.4481478 0.0005 FALSE
Cd28 0.5857841 6.8655039 -1.0380493 438 10.8284672 0.4175 FALSE
Cd53 0.5657920 0.1380933 0.4814790 238 0.2021161 0.5420 TRUE
Cd3e 0.4469605 2.7745785 0.2972853 260 2.6989556 0.9465 TRUE
Cd2 0.1999299 0.2130592 0.1159250 191 0.0885697 0.8120 TRUE
  • Cd53: low LFC variance
  • Cd2, Cd3e: low replicate correlation

Guide Filtering

Remove highly variable guides (variance of LFC’s 3 sd from mean)

  • Removed 64 guides

Human Resistence Data

Gene Filtering

Drug Gene Avg. Rep. Correlation Avg. Variance Avg.Avg.LFC Guides Avg.LFC Variance Dip.Test.P-Val Remove
6TG HPRT1 0.9296532 0.5917359 3.6539692 64 7.5110608 0.9915945 FALSE
AZD MED12 0.8758470 0.4875722 0.6236769 956 3.5307290 0.9935129 FALSE
AZD TADA1 0.8184488 0.6957460 -0.6849490 111 3.1061818 0.0574928 FALSE
AZD TADA2B 0.7575317 0.6708192 -0.1506191 201 2.1925174 0.9914553 FALSE
AZD CCDC101 0.7042762 0.9569480 -0.2300276 160 2.4852626 0.9866788 FALSE
PLX NF2 0.6519871 1.4997459 -0.0615335 226 3.1726561 0.8529861 FALSE
PLX NF1 0.5680280 1.8320517 -0.9834906 745 2.8341151 0.0414965 FALSE
PLX MED12 0.7805298 0.7922638 -0.5966506 956 2.9941566 0.8994090 TRUE
PLX CUL3 0.4344678 2.9605273 -0.6183203 155 2.9995941 0.6150521 TRUE
6TG PMS2 0.2618040 1.1758838 -4.1004333 262 0.6980747 0.9928136 TRUE
6TG MSH2 0.2541437 1.8882252 -4.2050059 224 1.1143426 0.9912448 TRUE
6TG MSH6 0.1981200 1.4471243 -4.2796887 432 0.7176871 0.9909761 TRUE
6TG MLH1 0.1093879 1.3699372 -4.3180802 256 0.5058294 0.8154171 TRUE
  • PLX/MED12: lower replicate correlation than AZD/MED12
  • Low replicate correlation for other genes Note: Replicates are experimental for RES data

Guide Filtering

Remove highly variable guides (variance of LFC’s 3 sd from mean)

  • Removed 112 guides

Training Set

Target Cut Site

  • Test Filtering at 80, 90, 100 percent

Off Target Effects

  • Will not filter by off-target effects

Training Set Summary

## [1] 169

Out Of Frame Mutation Rate

FORECasT and inDelphi Predict Similar OOF Mutation Rates

FORECasT Correlates Better With On Target Activity for 11 out of 13 Genes

FORECasT Correlates Better When Considering All Guides